Overview

Dataset statistics

Number of variables12
Number of observations99003
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory9.1 MiB
Average record size in memory96.0 B

Variable types

NUM11
CAT1

Reproduction

Analysis started2020-06-17 11:35:53.069932
Analysis finished2020-06-17 11:38:31.077607
Duration2 minutes and 38.01 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

mobile_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly correlated with mobile_likes_received and 1 other fieldsHigh correlation
www_likes_received is highly correlated with likes_receivedHigh correlation
likes_received is highly skewed (γ1 = 112.0745682) Skewed
mobile_likes_received is highly skewed (γ1 = 107.5312999) Skewed
www_likes_received is highly skewed (γ1 = 126.257317) Skewed
userid has unique values Unique
friend_count has 1962 (2.0%) zeros Zeros
friendships_initiated has 2997 (3.0%) zeros Zeros
likes has 22308 (22.5%) zeros Zeros
likes_received has 24428 (24.7%) zeros Zeros
mobile_likes has 35056 (35.4%) zeros Zeros
mobile_likes_received has 30003 (30.3%) zeros Zeros
www_likes has 60999 (61.6%) zeros Zeros
www_likes_received has 36864 (37.2%) zeros Zeros

Variables

userid
Real number (ℝ≥0)

UNIQUE

Distinct count99003
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1597045.2079128916
Minimum1000008
Maximum2193542
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum1000008
5-th percentile1060618.3
Q11298805.5
median1596148
Q31895744
95-th percentile2133357.1
Maximum2193542
Range1193534
Interquartile range (IQR)596938.5

Descriptive statistics

Standard deviation344059.1775
Coefficient of variation (CV)0.2154348391
Kurtosis-1.199556831
Mean1597045.208
Median Absolute Deviation (MAD)298438
Skewness0.0001076605667
Sum1.581122667e+11
Variance1.183767176e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11592241< 0.1%
 
11292021< 0.1%
 
10555101< 0.1%
 
18552271< 0.1%
 
21103691< 0.1%
 
19914491< 0.1%
 
21286661< 0.1%
 
18843351< 0.1%
 
20821231< 0.1%
 
10268481< 0.1%
 
Other values (98993)98993> 99.9%
 
ValueCountFrequency (%) 
10000081< 0.1%
 
10000131< 0.1%
 
10000151< 0.1%
 
10000381< 0.1%
 
10000591< 0.1%
 
ValueCountFrequency (%) 
21935421< 0.1%
 
21935381< 0.1%
 
21935221< 0.1%
 
21934991< 0.1%
 
21934851< 0.1%
 

age
Real number (ℝ≥0)

Distinct count101
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.28022383160106
Minimum13
Maximum113
Zeros0
Zeros (%)0.0%
Memory size773.5 KiB

Quantile statistics

Minimum13
5-th percentile15
Q120
median28
Q350
95-th percentile90
Maximum113
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.58974831
Coefficient of variation (CV)0.6059445462
Kurtosis1.561446767
Mean37.28022383
Median Absolute Deviation (MAD)10
Skewness1.415260654
Sum3690854
Variance510.2967289
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1851965.2%
 
2344044.4%
 
1943914.4%
 
2037693.8%
 
2136713.7%
 
2536413.7%
 
1732833.3%
 
1630863.1%
 
2230323.1%
 
2428272.9%
 
Other values (91)6170362.3%
 
ValueCountFrequency (%) 
134840.5%
 
1419251.9%
 
1526182.6%
 
1630863.1%
 
1732833.3%
 
ValueCountFrequency (%) 
1132020.2%
 
11218< 0.1%
 
11118< 0.1%
 
11015< 0.1%
 
1099< 0.1%
 

gender
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size773.5 KiB
male
58749
female
40254
ValueCountFrequency (%) 
male5874959.3%
 
female4025440.7%
 

Length

Max length6
Median length4
Mean length4.813187479
Min length4

tenure
Real number (ℝ≥0)

Distinct count2426
Unique (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean537.8848317727745
Minimum0.0
Maximum3139.0
Zeros70
Zeros (%)0.1%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile47
Q1226
median412
Q3675
95-th percentile1575
Maximum3139
Range3139
Interquartile range (IQR)449

Descriptive statistics

Standard deviation457.645601
Coefficient of variation (CV)0.8508245147
Kurtosis2.199181661
Mean537.8848318
Median Absolute Deviation (MAD)213
Skewness1.535709166
Sum53252212
Variance209439.4961
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3001730.2%
 
3031700.2%
 
2421640.2%
 
2721630.2%
 
2571610.2%
 
2971610.2%
 
2851600.2%
 
2801600.2%
 
2841580.2%
 
2781580.2%
 
Other values (2416)9737598.4%
 
ValueCountFrequency (%) 
0700.1%
 
1600.1%
 
2720.1%
 
3790.1%
 
4860.1%
 
ValueCountFrequency (%) 
31393< 0.1%
 
31291< 0.1%
 
31281< 0.1%
 
31011< 0.1%
 
30191< 0.1%
 

friend_count
Real number (ℝ≥0)

ZEROS

Distinct count2562
Unique (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean196.3507873498783
Minimum0
Maximum4923
Zeros1962
Zeros (%)2.0%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile3
Q131
median82
Q3206
95-th percentile720
Maximum4923
Range4923
Interquartile range (IQR)175

Descriptive statistics

Standard deviation387.304229
Coefficient of variation (CV)1.972511719
Kurtosis50.09427289
Mean196.3507873
Median Absolute Deviation (MAD)64
Skewness6.059008484
Sum19439317
Variance150004.5658
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
019622.0%
 
118161.8%
 
211171.1%
 
38600.9%
 
57890.8%
 
47490.8%
 
107370.7%
 
247320.7%
 
67200.7%
 
297190.7%
 
Other values (2552)8880289.7%
 
ValueCountFrequency (%) 
019622.0%
 
118161.8%
 
211171.1%
 
38600.9%
 
47490.8%
 
ValueCountFrequency (%) 
49231< 0.1%
 
49171< 0.1%
 
48631< 0.1%
 
48451< 0.1%
 
48441< 0.1%
 

friendships_initiated
Real number (ℝ≥0)

ZEROS

Distinct count1519
Unique (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.45247113723826
Minimum0
Maximum4144
Zeros2997
Zeros (%)3.0%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile1
Q117
median46
Q3117
95-th percentile418
Maximum4144
Range4144
Interquartile range (IQR)100

Descriptive statistics

Standard deviation188.786951
Coefficient of variation (CV)1.756934475
Kurtosis42.53560096
Mean107.4524711
Median Absolute Deviation (MAD)36
Skewness5.150757415
Sum10638117
Variance35640.51287
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
029973.0%
 
122122.2%
 
215511.6%
 
313551.4%
 
413521.4%
 
613281.3%
 
513281.3%
 
1113191.3%
 
813141.3%
 
1312791.3%
 
Other values (1509)8296883.8%
 
ValueCountFrequency (%) 
029973.0%
 
122122.2%
 
215511.6%
 
313551.4%
 
413521.4%
 
ValueCountFrequency (%) 
41441< 0.1%
 
36541< 0.1%
 
35941< 0.1%
 
35381< 0.1%
 
34151< 0.1%
 

likes
Real number (ℝ≥0)

ZEROS

Distinct count2924
Unique (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.07878549134875
Minimum0
Maximum25111
Zeros22308
Zeros (%)22.5%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median11
Q381
95-th percentile726
Maximum25111
Range25111
Interquartile range (IQR)80

Descriptive statistics

Standard deviation572.2806808
Coefficient of variation (CV)3.666614134
Kurtosis200.4456878
Mean156.0787855
Median Absolute Deviation (MAD)11
Skewness11.02370356
Sum15452268
Variance327505.1777
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02230822.5%
 
169287.0%
 
244344.5%
 
332403.3%
 
425072.5%
 
520272.0%
 
618061.8%
 
716181.6%
 
814301.4%
 
913811.4%
 
Other values (2914)5132451.8%
 
ValueCountFrequency (%) 
02230822.5%
 
169287.0%
 
244344.5%
 
332403.3%
 
425072.5%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
165831< 0.1%
 
147991< 0.1%
 

likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count2681
Unique (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.6893629485975
Minimum0
Maximum261197
Zeros24428
Zeros (%)24.7%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q359
95-th percentile561
Maximum261197
Range261197
Interquartile range (IQR)58

Descriptive statistics

Standard deviation1387.919613
Coefficient of variation (CV)9.726861091
Kurtosis17384.94
Mean142.6893629
Median Absolute Deviation (MAD)8
Skewness112.0745682
Sum14126675
Variance1926320.851
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02442824.7%
 
173057.4%
 
245414.6%
 
333473.4%
 
426692.7%
 
523732.4%
 
618731.9%
 
716801.7%
 
815381.6%
 
913511.4%
 
Other values (2671)4789848.4%
 
ValueCountFrequency (%) 
02442824.7%
 
173057.4%
 
245414.6%
 
333473.4%
 
426692.7%
 
ValueCountFrequency (%) 
2611971< 0.1%
 
1781661< 0.1%
 
1520141< 0.1%
 
1060251< 0.1%
 
826231< 0.1%
 

mobile_likes
Real number (ℝ≥0)

ZEROS

Distinct count2396
Unique (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.11629950607558
Minimum0
Maximum25111
Zeros35056
Zeros (%)35.4%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q346
95-th percentile481.9
Maximum25111
Range25111
Interquartile range (IQR)46

Descriptive statistics

Standard deviation445.2529851
Coefficient of variation (CV)4.195896268
Kurtosis360.9885806
Mean106.1162995
Median Absolute Deviation (MAD)4
Skewness14.16123656
Sum10505832
Variance198250.2207
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03505635.4%
 
162976.4%
 
239414.0%
 
329172.9%
 
422652.3%
 
517941.8%
 
615981.6%
 
713951.4%
 
812121.2%
 
911491.2%
 
Other values (2386)4137941.8%
 
ValueCountFrequency (%) 
03505635.4%
 
162976.4%
 
239414.0%
 
329172.9%
 
422652.3%
 
ValueCountFrequency (%) 
251111< 0.1%
 
216521< 0.1%
 
167321< 0.1%
 
140391< 0.1%
 
135291< 0.1%
 

mobile_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count2004
Unique (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.1204912982435
Minimum0
Maximum138561
Zeros30003
Zeros (%)30.3%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q333
95-th percentile317
Maximum138561
Range138561
Interquartile range (IQR)33

Descriptive statistics

Standard deviation839.8894437
Coefficient of variation (CV)9.984362083
Kurtosis15522.64932
Mean84.1204913
Median Absolute Deviation (MAD)4
Skewness107.5312999
Sum8328181
Variance705414.2777
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03000330.3%
 
182438.3%
 
249485.0%
 
336083.6%
 
429443.0%
 
523832.4%
 
620222.0%
 
717451.8%
 
815211.5%
 
914371.5%
 
Other values (1994)4014940.6%
 
ValueCountFrequency (%) 
03000330.3%
 
182438.3%
 
249485.0%
 
336083.6%
 
429443.0%
 
ValueCountFrequency (%) 
1385611< 0.1%
 
1312441< 0.1%
 
899111< 0.1%
 
733331< 0.1%
 
434101< 0.1%
 

www_likes
Real number (ℝ≥0)

ZEROS

Distinct count1726
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.96242538104906
Minimum0
Maximum14865
Zeros60999
Zeros (%)61.6%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q37
95-th percentile208
Maximum14865
Range14865
Interquartile range (IQR)7

Descriptive statistics

Standard deviation285.5601519
Coefficient of variation (CV)5.715498191
Kurtosis449.1484832
Mean49.96242538
Median Absolute Deviation (MAD)0
Skewness16.91102529
Sum4946430
Variance81544.60033
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
06099961.6%
 
146974.7%
 
227602.8%
 
319482.0%
 
414191.4%
 
512021.2%
 
610811.1%
 
78970.9%
 
87920.8%
 
97570.8%
 
Other values (1716)2245122.7%
 
ValueCountFrequency (%) 
06099961.6%
 
146974.7%
 
227602.8%
 
319482.0%
 
414191.4%
 
ValueCountFrequency (%) 
148651< 0.1%
 
129031< 0.1%
 
110771< 0.1%
 
107631< 0.1%
 
106271< 0.1%
 

www_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count1636
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.56883124753795
Minimum0
Maximum129953
Zeros36864
Zeros (%)37.2%
Memory size773.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q320
95-th percentile227
Maximum129953
Range129953
Interquartile range (IQR)20

Descriptive statistics

Standard deviation601.416348
Coefficient of variation (CV)10.26853934
Kurtosis23812.2491
Mean58.56883125
Median Absolute Deviation (MAD)2
Skewness126.257317
Sum5798490
Variance361701.6237
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03686437.2%
 
185138.6%
 
251115.2%
 
335863.6%
 
428282.9%
 
523172.3%
 
619181.9%
 
716021.6%
 
814451.5%
 
913731.4%
 
Other values (1626)3344633.8%
 
ValueCountFrequency (%) 
03686437.2%
 
185138.6%
 
251115.2%
 
335863.6%
 
428282.9%
 
ValueCountFrequency (%) 
1299531< 0.1%
 
621031< 0.1%
 
396051< 0.1%
 
392131< 0.1%
 
340391< 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

useridagegendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
0209438214male266.000000000
1119260114female6.000000000
2208388414male13.000000000
3120316814female93.000000000
4173318614male82.000000000
5152476514male15.000000000
6113613313male12.000000000
7168036113female0.000000000
8136517413male81.000000000
9171256713male171.000000000

Last rows

useridagegendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedwww_likeswww_likes_received
98993165456519male394.04538414445011508844355961669127
98994206300620female402.01988332735110602572487333310332692
98995113216420female699.03611973450777684414690993859
98996166869524female182.0293812726018177655843117081756057
98997145898528female290.022181618462610268429042503366018
98998126829968female541.021183413996180893505118874916202
98999125615318female21.01968172044011341243991059222820
99000119594315female111.0200215241195912554119591146201092
99001146802323female416.0256018545066516450657600756
99002139789639female397.020497689410124439410953002913